Characterizing workflow-based activity on a production e-infrastructure using provenance data

نویسندگان

  • Souley Madougou
  • Shayan Shahand
  • Mark Santcroos
  • Barbera D. C. van Schaik
  • Ammar Benabdelkader
  • Antoine H. C. van Kampen
  • Sílvia Delgado Olabarriaga
چکیده

Grid computing and workflow management systems emerged as solutions to the challenges arising from the processing and storage of shear volumes of data generated by modern simulations and data acquisition devices. Workflow management systems usually document the process of the workflow execution either as structured provenance information or as log files. Provenance is recognized as an important feature in workflow management systems, however there are still few reports on its usage in practical cases. In this paper we present the provenance system implemented in our platform, and then use the information captured by this system during 8 months of platform operation to analyze the platform usage and to perform multilevel error pattern analysis. We make use of the large amount of structured data using the explanatory potential of statistical approaches to find properties of workflows, jobs and resources that are related to workflow failure. Such an analysis enables us to characterize workflow executions on the infrastructure and understand workflow failures. The approach is generic and applicable to other e-infrastructures to gain insight into operational incidents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards next Generation Provenance Systems for E-science towards next Generation Provenance Systems for E-science

e-Science helps scientists to automate scientific discovery processes and experiments, and promote collaboration across organizational boundaries and disciplines. These experiments involve data discovery, knowledge discovery, integration, linking, and analysis through different software tools and activities. Scientific workflow is one technique through which such activities and processes can be...

متن کامل

Managing the Deluge of Scientific Data

Provenance information in eScience is metadata that's critical to effectively manage the exponentially increasing volumes of scientific data from industrial-scale experiment protocols. Semantic provenance, based on domain-specific provenance ontologies, lets software applications unambiguously interpret data in the correct context. The semantic provenance framework for eScience data comprises e...

متن کامل

Using Cloud-Aware Provenance to Reproduce Scientific Workflow Execution on Cloud

Provenance has been thought of a mechanism to verify a workflow and to provide workflow reproducibility. This provenance of scientific workflows has been effectively carried out in Grid based scientific workflow systems. However, recent adoption of Cloud-based scientific workflows present an opportunity to investigate the suitability of existing approaches or propose new approaches to collect p...

متن کامل

Towards Next Generation Provenance Systems for e-Science

e-Science helps scientists to automate scientific discovery processes and experiments, and promote collaboration across organizational boundaries and disciplines. These experiments involve data discovery, knowledge discovery, integration, linking, and analysis through different software tools and activities. Scientific workflow is one technique through which such activities and processes can be...

متن کامل

Linked provenance data: A semantic Web-based approach to interoperable workflow traces

The Third Provenance Challenge (PC3) offered an opportunity for provenance researchers to evaluate the interoperability of leading provenance models with special emphasis on importing and querying workflow traces generated by others. We investigated interoperability issues related to reusing Open Provenance Model (OPM)-based workflow traces. We compiled data about interoperability issues that w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Future Generation Comp. Syst.

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2013